Conversation
|
Looks good to me, thanks for working on this! I just updated the I wonder if we should try to resolve all the failing round-trip tests in this PR or save them for another PR -- thoughts? |
I think there are too many disparate non-trivial fixes to have them all in one PR. |
|
Gotcha, I think Kevin @ekiwi normally prefers for all CI checks to be passing before merging a PR, I'll defer to him re: whether we can leave some round-trip tests failing when we merge this PR (i.e. fix them in a separate future PR). |
|
This looks really good! Thanks for updating this so quickly! I'm happy with this |
Agreed. the check will "pass" now but it will be passing with some examples having |
There was a problem hiding this comment.
TODO for Ernest: check why the monitor is failing here
There was a problem hiding this comment.
TODO for Ernest: check why monitor is failing here
There was a problem hiding this comment.
TODO for Ernest: investigate this (b ought to be defined)
There was a problem hiding this comment.
TODO for Ernest: this is probably a off-by-one error (check the waveform manually, it should have length 2)
There was a problem hiding this comment.
- If waveform has length 3, there's a bug w/ interpreter
- If waveform has length 2, there's a bug w/ monitor
|
Context for @Nikil-Shyamsunder: Kevin and I discussed the failing round-trip tests today in-person, I left comments above on the ones that I need to investigate further (to see if they're bugs with the monitor). Kevin says we should wait till these failing tests are fixed before merging this PR if that's OK! |
|
sounds like a plan! |
…peat loops as allowed-to-fail for RT
…ps, apply to all tests with repeat loops
|
Added a new CLI flag I added this flag to all the Note: because of how Turnt's overrides work (the same |
This PR focuses on infrastructure, not semantic bug fixes.
We now integrate roundtrip checks with Turnt instead of running only a standalone script.
roundtripenvironment for roundtrip.txfile via:scripts/roundtrip_case.py {filename}justfileto run roundtrip through Turnt.For each
.txfile:// ARGSand// RETURNmetadata.// RETURN(recorded as skip)..txagainst monitor-emitted trace block(s) after normalization (get rid of comments, whitespace etc).Turnt snapshots roundtrip stdout into
.rtfilesUpdated the turnt output to also have the expected (monitor) traces and the actual (interpreter) trace. so for a succeeding test, it might look like:
This is good for passing examples, because we can check if there is any change in interpreter or monitor behavior that changes how many or what traces show up even for passing tests, instead of a blanket "pass".
A roundtrip trace might fail like this due to a monitor error:
Or it might fail where the actual trace is not in the monitor output, like this:
monitor failures due to panics are sanitized to get rid of machine-specific thread numbers (observed this on the GitHub actions CI machine) and because the line numbers in the
panicked atline will change constantly.we might also consider putting in a low-priority issue that some of this can be ported into rust (I am thinking specifically of the logic for generating a trace, passing it to the monitor, comparing the resulting traces, etc.) so that we have native support instead of relying on janky string processing.